Problem
Problem description.
I have JSON which comes in bad shape:
data = [
{"ids": [1]},
{"ids": [3, 4]},
{"ids": [1, 2]},
{"ids": [4]},
{"ids": [3]},
{"ids": [2]},
] # LD. List of dictionaries.
I want it to get in shape, like this:
expected = [
[{"ids": [1]}, {"ids": [2]}], # Length = 1
[{"ids": [3, 4]}, {"ids": [1, 2]}], # Length = 2
] # LOLD. List is now list-of-lists of dictionaries.
To simplify the problem, we can remove the dictionaries of a single kv-pair, keeping in mind that we must reconstruct it later:
# in
[
[3, 4], [1, 2],
[1], [4], [3], [2]
]
# out
[
[[3, 4], [1, 2]]
[[1], [4], [3], [2]]
]
This is very easy. assemble . op . disassemble $ data
:
def main(ids):
return [list(x) for x in assemble(cardinality_groups(disassemble(ids)))]
def cardinality_groups(lol):
return [list(group) for _, group in groupby(sorted(lol, key=len), key=len)]
def assemble(data):
return [tag_datum(x) for x in data]
def tag_datum(datum):
return [{"ids": x} for x in datum]
def disassemble(ids):
return [x['ids'] for x in ids]
but
I insist, it must be simpler, purer! Although, I am not sure if Python sports the amenities to make things prettier. So please suggest functionality present in other languages.
I am curious about two ways the program can expand here, and some other things:
- By the complexity and nesting of the data. Data can take the form of any JSON found in the wild. Here we simply descend a few levels down.
- By the operation performed: Here grouping by cardinality solved the issue. In another world we want no two sets to intersect. Is there any more complex operations, what are they, and do they break anything?
- Assembly-disassembly symmetry. The two should be each other’s inverse, so can I deduce one from the other, thus not having to code it. Does any language provide such tools?
- Beyond typing, are there any languages that support describing how data looks when it comes in, and how it will look when it comes out? Not just the top-level type, but the shape of the data the code works with at that level of abstraction. My presumption is that many programs lend well to this type of reasoning.
- I don’t like
f(g(h(x)))
. I likef . g . h $ x
–it’s purer. It really bothers that I can’t do something like this in Python, or JavaScript–two of the most popular languages! Consequently, I frequently find myself doing either:
someValue = dostuff(someInput)
valueIsNowSlightlyChanged = doMoreStuff(someValue)
iAmLosingTrack = doStuffMoreNow(valueIsNowSlightlyChanged)
final = wtf(iAmLosingTrack)
return final
Or variations thereof. At this point I don’t feel like using either language. Doing things this way isn’t, of course, isn’t always going to be possible, but I don’t even get the opportunity. Am I confused, or do I have a point, and you possibly a solution to my supposed confusion?