Wormhole – Data Collection Cheat Sheet and Library in 4 Languages

During the Advent of Code this past year, I was trying to enhance my knowledge of Elixir, as well as just functional programming in general. There were times where I found a function that most other languages I used didn’t have an analogous function (a glaringly obvious one being Enum.reduce_while), and other times where I was writing functions I’d used often in other languages (Clojure’s frequencies would’ve been mighty handy!). I finally decided to bite the bullet and try to create a list of collection manipulation functions I used often in other languages, implement them in others I’ve learned/currently trying to learn, and discover new functions that I wouldn’t want to live without!

What Is Wormhole?

It’s this, really:

jdsteinhauser / wormhole

Some of my most used functions implemented in different languages

Wormhole
You ever think, “Hey I wish this langugage had the capability of some other language I like?" Enter Wormhole.

Motivation
During the Advent of Code 2018, I found myself writing the same functions in Elixir that
I knew I had used in Clojure, F#, or some other language. In order to prevent myself from doing this in the future, I
decided that building a library to do house all of these helpful functions across languages that I either knew or
wanted to learn.

Desired functions
These are the functions that I’ve used and that I’d like to have in multiple languages. Some of them are already implemented
in the language already, so I won’t reimplement them. Each implementation will list the functions as implemented in the language
as well as links to their documentation.

map
filter
reduce
reduce_while
chunk
chunk_by
juxt
min_by
max_by
frequencies
group_by

Languages

Current Implemented

Elixir

View on GitHub

I’ve got an addiction. I love learning new languages. With learning new languages, you end up finding functions, classes, and concepts that you wish that you had in other languages. Sometimes, those functions are named different things and it gets confusing when you switch between languages. I end up doing a lot of data collection manipulation, and so I decided to start with what I knew best and branch out from there!

What Functions Am I Looking For?

For a non-exhaustive list, I wanted to have at least the following:

Collection basics: map, filter, reduce, and scan

Chunking data: chunk, chunk_by

Common stats: min_by, max_by, group_by, frequencies

Other hella useful things: reduce_while, juxt, identity

What Languages Am I Targeting?

For now, I have filled in my perceived gaps in functions in C#, Clojure, and Elixir. I have an F# solution that I’ll be comfortable with early this week, and I’ve started looking at a comprehensive list of Ruby functions as well. After that… well, I’m not entirely sure! I think I’m going to go through Rust, JavaScript, Java, and possibly Kotlin and Python 3 to see what other handy things I can implement across all those languages.

Will These Be Deployed to Package Managers?

Yes… but not right now. I need to get the documentation to a suitable state. I’ve pulled down several packages before but I’ve never pushed mine up to any! I’m sure that will end up being a blog post in and of itself.

Current Cheat Sheet

Here’s a summary of the languages I’ve targeted so far, with documentation links to each function that either already exists, or that I’ve implemented in Wormhole.

Function
C#
F#
Clojure
Elixir

map
Enumerable.Select
Seq.map
clojure.core/map

Enum.map/2, Stream.map/2

filter
Enumerable.Where
Seq.filter
clojure.core/filter

Enum.filter/2, Stream.filter/2

reduce
Enumerable.Aggregate
Seq.reduce
clojure.core/reduce
Enum.reduce

reduce_while
ReduceWhile
reduceWhile
reduce-while
Enum.reduce_while/3

scan
Scan
Seq.scan
clojure.core/reductions

Enum.scan, Stream.scan

chunk
Chunk

chunk*
clojure.core/partition

Enum.chunk_every/4, Stream.chunk_every/4

chunk_by
ChunkBy
chunkBy
clojure.core/partition-by

Enum.chunk_by/2, Stream.chunk_by/2

juxt
Juxt

juxt, juxt2, juxt3

clojure.core/juxt
Wormhole.juxt/1

min_by
MinBy
Seq.minBy
min-by
Enum.min_by/3

max_by
MaxBy
Seq.maxBy
max-by
Enum.max_by/3

frequencies
Frequencies
freqs
clojure.core/frequencies
Wormhole.freqs/1

group_by
Enumerable.GroupBy
Seq.groupBy
clojure.core/group-by
Enum.group_by/3

identity
Identity
Operators.id
clojure.core/identity
Wormhole.identity/1

F# contains a Seq.windowed function, but it only moves the chunk one element at a time.

Why Is This Stuff Useful?

Well, some of the functions are either self-explanatory or already written about in several other articles. I’ll cover some of the lesser known ones and why I personally found them useful.

Chunking

I’ve written about chunk and chunk_by before, but in case you missed it, check out my previous article!

Alright, Break It Up! Using Partition/ Chunk
Jason Steinhauser

#fp#programming#coding#algorithms

Reduce While

I’ll admit that this is possibly a not-so-often used case. Sometimes you don’t want to reduce an entire sequence – just up to a certain point. Unfortunately, reduce is typically all or nothing. That doesn’t really work when you have a potentially infinite series of data. However, Elixir’s reduce_while helped me keep my solution for AoC 2018 Day 1 Part 2 compact. I’m hoping to find more real-world use cases for it… but it’s still one of my favorite data processing functions I’ve found.

Juxt

While I admit that, at first glance, juxt is nothing special. Take an array of functions that operate on the same parameters, and then return a single function that takes that parameter and returns an array of each function run on those parameters? Why use that?
I’ve ported this function from Clojure into other work projects before. For instance, I had a very large collection of data (1MM+ entries!) and I couldn’t afford to iterate over them multiple times. I used juxt to compose my analysis functions together so that I only had to iterate over the collection one time.
Similarly, since a keyword in Clojure can be treated as a function for retrieving a value out of a map with that key ((:foo {:foo 5 :bar 3}) returns 5), you can compose several keywords for accessing data out of a collection of maps and returning the results in kind of like a table format. I wrote about that as part of a previous post on dense Clojure code:

A verbose explanation of compact code
Jason Steinhauser

#showdev#clojure#productivity

Frequencies

Because sometimes, you just need a histogram. frequencies provides that in one single function!

Conclusion

Hopefully someone out there will find this useful, either as a cheat sheet or as a library. In the near-term, I will be investigating Ruby and Rust (in that order) to see what other handy functions I could foresee using across multiple languages. I’ll also put Wormhole up as a package in your favorite package managers soon, and probably write about the things I do/don’t like about each.
Happy coding, and I’d love to hear about other general purpose data manipulation functions you’ve found useful!

Link: https://dev.to//jdsteinhauser/wormhole—data-collection-cheat-sheet-and-library-in-4-languages-56de