Previously in Experiments, part 4,…
I added a form for creating new links. Instead of introducing a database, the links are being added to an
In tonight’s installment…
I am going to start working the trickier parts of bookmarking links, retrieving a title of a link and possibly the body. At some point, I’ll have to commit to a database, but I can push that off for at least one more post. I would like to have a background queue-like functionality without introducing an actual background queue, and I think core.async can help me do that
What I would like to do first is fetch the title of the posted URL. This will enable me to have a nice description of the link I saved. I need a library to which I can give a URL and have it fetch the HTML of the page. It would be nice if it could also parse that HTML. The most popular library in Clojure that does this is called Enlive.
As usual, I’m going to add the library to the
project.clj file and require it in the
1 2 3 4 5
With that done, I need to add a few methods to fetch and parse HTML.
1 2 3 4 5 6 7
This is the first function where Clojure’s interop with Java starts to show. Using the Java’s URL library to get open the connection, the code threads that connection through a few things ending with the
getContent function. Enlive then turns that resource into a map that I can iterate over. A quick REPL test shows me how this looks.
OK, I have a map representing the HTML, I need to get at the title.
Enlive lets me select the element based on a vector I pass in. Since I all I want is a title, it’s a simple vector being passed in.
html/select takes the map of HTML and the selector vector and gives me back a map representing the selected thing.
I then extract the content and make it the return value.
Now that I have that working, I am going to do a simple integration into the current flow. POSTing the new link will block until the title is extracted, which is what
core.async will solve.
1 2 3 4 5 6 7
The change I made is in
assoc, where instead of
:title (:url new-link), I now have
:title (get-title (:url new-link)). This should block while getting the title.
The time it takes for the new link to show up varies based on the response time of the website, so while it can be quick some of the time, other times it might take a while.
That’s it for now. In the next post, I am going to make that fetching asynchronous, which may take a bit of rejiggering of things.