GistTree.Com
Entertainment at it's peak. The news is by your side.

Reverse Engineer Amazon’s Whispersync

0
May maybe just 26, 2020

I am a astronomical fan of my Kindle. Here’s the gadget I’d raise over to a wilderness
island. I am also a pretty frequent user of the highlight characteristic where you would
place cease a little bit of text you’re taking care of and place it. One ingredient is bothering me though:
the shortcoming of commence API on the Kindle to retrieve the tips on me. Amazon
supplies the my-clippings.txt file on the Kindle and the “Export my
annotations by e-mail” characteristic, nonetheless both these aspects can now no longer be accomplished within the
background automatically, and the last learn plot is now no longer accessible. I prefer
moar !

I wish to score the raw files:

  • to print a little bit of book with all my annotations
  • to retrieve my last learn plot and seek wherein intervals of my life I tend
    to learn most (reading less might perhaps be a honest indicator of stressful intervals)
  • to position the tips in my to find non-public set : my cozy (I work there).
    I might perhaps as an instance randomly demonstrate current quotes in my Comfortable-Home, or
    have confidence a “books” app with a public web page with all my books. That might effectively be the
    digital equivalent of a bookshelf. The bookshelf as a mode of sharing honest
    books that you’ve learn is misplaced with the Kindle. I wish I might perhaps without trouble portion
    with my guests my reading checklist. goodreads.com might perhaps be one other blueprint to impress
    that, nonetheless I’d pick to have confidence the tips at my disposal.

The possibilities are never-ending… nonetheless first, I wanted to score score admission to to the
files. A truly prolonged time within the past, I tried to score admission to this knowledge by Kindle web
nonetheless might perhaps now no longer pass the login veil by scraping: there used to be if I win
effectively, credential encryption on the application aspect, and I had now no longer managed to
redo it accurately.

Objective currently, I indubitably have confidence seen a presentation on the utilization of mitmproxy to sight HTTP
APIs feeble by apps. It gave me motivation to commence over again on this challenge: this
time, rather than web scraping, I’d are trying to fancy how the Kindle
communicates with Amazon and are trying to emulate the identical HTTP requests to score admission to
my files.

Setup a man within the heart on a Kindle app

Disclaimer: affect now no longer are trying man-in-the-center assaults on gadgets or accounts
which would be now no longer your to find. To successfully behavior this challenge, you will must have confidence
bodily score admission to to the gadget to install an SSL certificate authority.

To explore the tear of conversation between an Android gadget and a miles off HTTP
based mostly APIs, issues are a little bit more fervent than from a web-based browser: within the salvage
world now we have confidence score admission to to the network inspector which helps us file files,
replay calls, seek requests and responses, and loads others… There might be no longer any such ingredient for
Android or iOS. The resolution is to impress a man-in-the-center assault on our
gadget: proxying and recording all web page online traffic by a controlled score admission to point.

The setup is temporarily described below:

  • A Raspberry Pi runs as a wifi score admission to point (by hostapd)
  • Every packet going by the score admission to point is redirected to mitmproxy (in
    the clear mode vogue
    )
  • HTTPs man-in-the-center is made that you would imagine thanks to a custom CA authority
    set in on the whispersync consumer gadget
  • A genuine or virtual (as an instance with Genymotion) Android 5
    gadget
  • Kindle for Android set in on the Android gadget (the Play retailer will be
    set in on Genymotion)

ℹ️ It is also that you would imagine to utilize iOS nonetheless it with no doubt is easier to have confidence score admission to to a
virtual Android gadget since you affect now no longer wish to to find a Mac

ℹ️ On Android, it’s far doable to utilize user certificates unless Android 7
Nougat. After Android 7 Nougat, apps will now no longer utilize user certificates by
default. More files
right here.

Here is a honest article to save the entire lot
up
.

After all this setup (a little bit prolonged and leisurely nonetheless now no longer exhausting), it’s far doable
to file HTTP dumps of the web page online traffic between the Kindle app and Amazon.

ssh pi@192.168.50.10
tmux a -t 0
# originate mitmdump and write quiz/response to outfile
mitmdump --mode clear -w outfile
# Command the Kindle app to generate web page online traffic
# ctrl-c to discontinue the proxy
exit #

At this point, outfile contains your entire requests and responses within the
mitmproxy format.

Recording HTTP dumps

mitmproxy makes utilize of a custom format to retailer dumps. It’s now no longer very interoperable:
querying and extracting files from the dumps is a little bit hard. HAR is a
favorite format feeble as an instance in Chrome and Firefox (Community inspector tabs
in both browser can import/export the HAR format). The HAR format underlying
format is JSON which makes it readable and interoperable with tools take care of jq
which would be basically to hand for filtering / querying.

The mitmproxy repository contains a script to transform a mitmproxy
dumpfile correct into a HAR file.

To convert a mitmproxy dump into an HAR file:

git clone https://github.com/mitmproxy/mitmproxy.git mitmproxy # to have confidence score admission to to the HAR conversion script
pip install mitmproxy # to have confidence the mitmdump relate

# Extract the dump from the Raspberry Pi
rsync -aPrs pi@192.168.50.10:outfile dump.mitmproxy

# Convert mitmproxy format to HAR format (JSON based mostly format)
mitmdump -vvv -n -r infile.dump  -s mitmproxy/examples/advanced/har_dump.py --save hardump=./outfile.har

Conception the tear

With jq, we are able to extract requests URLs, which is to hand to
commence figuring out the tear of conversation between Amazon and our
gadget.

$ cat dumps/dump.har | jq -rc '.log.entries | blueprint(.quiz.url)'
"https://54.239.22.185/FirsProxy/registerDevice",
"https://54.239.22.185/FirsProxy/registerDevice",
"https://54.239.22.185/FirsProxy/registerDevice",
"https://54.239.22.185/FirsProxy/registerDevice",
"https://54.239.22.185/FirsProxy/getStoreCredentials",
"https://52.46.133.19/FionaTodoListProxy/syncMetaData",

# Irregular URLS
cat dumps/dump5.har | jq '.log.entries | blueprint(.quiz.url) | style | extraordinary'

# Filter all requests with a particular URL
cat dumps/dump5.har | jq '.log.entries | blueprint(place cease(.quiz.url == ""))'

Registration and quiz signing

The main call the gadget makes when logging in is a registerDevice call. It
contains both login and password and is feeble to register a gadget on Amazon.
After replaying this quiz by curl, I had a “Tainted password”
response. It is because Amazon sends a two ingredient authentication email and likewise you
wish to replay (over again) the registerDevice call with the mark from the two
ingredient authentication email within the password field to score by.

At this point a fresh gadget is seen on Amazon Kindle location 🙌. The next issues
to impress is to salvage the checklist of books.

The difficulty now is that we are able to seek that each and each subsequent quiz to Amazon has
a X-ADP-Query-Digest: each and each quiz is “signed” with the relieve of a
certificate got within the registerDevice call.

Below, you would seek an instance quiz on the syncMetadata route feeble to score the
checklist of books:

{
  "startedDateTime": "2020-05-10T14: 21: 42.217752+00: 00",
  "time": 6532,
  "quiz": {
    "formula": "GET",
    "url": "https://52.46.133.19/FionaTodoListProxy/syncMetaData",
    "httpVersion": "HTTP/1.1",
    "headers": [
      {
        "name": "X-ADP-Request-Digest",
        "value": "SIG1tis85OWFqJqbzy0Z0xBzBCI3/88e9p/2jr8UvTAUQCuil5ED0833peNeKPp1dIMdVAs/INcUR//xvCJu+ngyP9olVSda/IBBxM2fftVGIDEVuQqMSC9P+O/pZMhaAJpvxIm78M52OB+lNIYXjE0Kr1OB0mmOo4iVu45aRio8hZDlmDG07zjVHnlQHE5sUjzOMnYBFC6VXw+srjYfo6dTptwSKNX11A0naG+tjcuxnglAE3R9U8/+pVr/uFNT4ou+0cQs2KbV0/4tYEIbOogC1JgjNNt4hyb2l91QED7Aj+A/DFcKBT+XNkjAUAAI1//HhCtxqCNtbu1E1sRReQ==:2020-04-10T14:21:40Z"
      }
    ]
  }
}

Signing the quiz blueprint that I wanted to utilize the honest key to designate (I was
fairly assured that the privateKey get within the registerDevice call used to be the
one to utilize) designate the honest files = obtain the fields from the quiz (and in
which repeat) that had been feeble to impress a fingerprint of the quiz

Happily, after a little bit of looking I came across the repositories of
lolsborn (readsync and fiona-consumer) that had implementation of
quiz signing for the whispersync API.

        files = "%sn%sn%sn%sn%s" % 
            (formula, url, time, postdata, self.adp_token)  
        rsa = RSA.load_key_string(str(self.private_pem))
        crypt = rsa.private_encrypt(hashlib.sha256(files).digest(),
            RSA.pkcs1_padding)
        sig = monstrous64.b64encode(crypt)

From readsync provide code.

From this bit of code, I might perhaps seek which fields had been feeble and wherein repeat. I
now wished “most efficient” to convert it to Javascript.

With the relieve of node-forge and node-rsa, after an even bit of
fight, I managed to have confidence the honest signature. I chanced on it hard even
with two instance implementations to score the signature precisely honest (I am now no longer
an skilled in crypto technologies and did no longer know both libraries, so I spent a
little bit of time seeking to jam the certificate and values in each and each that you would imagine
systems 🙃).

Having a obnoxious signature is now no longer straight away evident since you will must ship a
quiz to Amazon to verify it (and if it’s obnoxious, Amazon sends an Internal Server Error). The utilize of a recorded dump to have confidence an instance of the honest
signature and the utilization of it in a Jest take a look at with a mounted date (since the date is
feeble within the quiz) used to be very honorable as the feedback loop used to be very fleet.

⚠️ I had to utilize two varied libraries to score the identical signature. I am
obvious it’s far doable to utilize most efficient node-forge though.
ℹ️ The inner most key is a PKCS8 certificate in DER format encoded in monstrous64.
❓I ponder why to designate the requests, the gadgets affect now no longer generate
a non-public/public key pair and transmit the general public key to Amazon: it’s far
a little bit outlandish to have confidence something inner most waft on the network.

Sidecars

After cracking the signing piece, it used to be that you would imagine to call the syncMedata
route, with a upright digest header, to score your entire books in our library
(yay!).

Nonetheless, the disaster used to be now no longer accomplished yet: I was most enthusiastic by getting the
metadata on the book: my highlights, annotations and last web page learn. I
wished to seek out the call that used to be retrieving all this knowledge.

When seeking out the hiss material of a highlight within the dump, I might perhaps now no longer obtain
the leisure 🤔. When filtering the URLs of the dump with the identifier of the
book (the ASIN in Amazon parlance), some URLs regarded intelligent:

https://72.21.194.248/FionaCDEServiceEngine/FSDownloadContent?form=EBOK&key=
https://54.239.26.233/FionaCDEServiceEngine/sidecar?form=EBOK&key=

FSDownloadContent used to be the call to score the hiss material of the book. Nonetheless what used to be
the sidecar URL ? Sidecars sounds take care of something loaded to the aspect of the
book, maybe containing annotations ?

The body of the response used to be monstrous64 encoded, and after a monstrous64 --decode,
bingo! I might perhaps learn the hiss material of my book highlights into my terminal.

Now, this is a proper sidecar

Now, right here’s a upright sidecar

# The utilize of the advert-hoc CLI gadget to salvage a sidecar in binary format and decode it
$ myth -s cli salvage sidecar --no-parse B01056E716 | monstrous64 --decode
CR!3WET39TJ553PXCAHWBHXQ1RT_PAR^̵]^̵]BPARMOBI&&<@6^^ !b#f$%*f"V        F
BPAR8FYDATApce plaisir puis dans l agressivit ainsi que cet amour de la servilit grgaire n
DATAC tait la mme illusion, procdant de la mme volont de s illusionner soi-mme.8
DATAon n avait pas encore invent le systme actuel qui consiste  assommer les gens  coups de matraque et  les exterminerDATAaffreuse tension. Le dimanche matin, la radio annonait la nouvelle que l Angleterre avait dclar la guerre  l Allemagne. Ĉ
DATAils n aspiraient  rien d autre qu  enchaner, dans un effort silencieux et pourtant passionn, des strophes parfaites dont chaque vers tait pntr de musique, brillant de couleurs, clatant d images.P
DATAnous qui attendons de chaque jour qui se lve des infamies pires encore que celles de la veille, nous sommes nettement plus sceptiques quant  la possibilit d une ducation morale des hommes.
BKMK4W*WW*ޭ$#BKMK4^,^^,ޭBKMK4~y:~yޭBKMK4ޭ! BKMK4`ޭBKMK1ޭBKMK4

(I’ve added line breaks for clarity)

As you can see, the problem was that the decoded content was a bit garbled :
accents are missing and annotations are not delimited correctly. Turns out,
Amazon uses a custom binary format for its sidecars. After encryption, binary
decoding, what an adventure 🤠!

When searching for the sidecar application type, I stumbled upon KSP
(Kindle Server Proxy), a project to connect a Calibre library to the Kindle
seamlessly. It works as a middleware between Amazon and your Kindle,
implementing routes of the Whispersync API. It serves content from your
Calibre database, or from Amazon as needed: if a book is from
Amazon, content is served from Amazon, otherwise it’s served from a local
database.

KSP did not need to read sidebars as it was sufficient
for it to only pass the content without reading it. It did however need to
know how to write sidebars. The code was informative on the binary format
used, but I knew I needed to code a sidebar parser.

Bytes in color

At this point, I decided to have a look at the bytes in a graphic form to
understand better what was going on. I had read recently about visualising a binary
in color
so I figured I
could try to do the same thing. By reimplementing, I would have more control
onto the visualisation and hopefully it would serve when debugging the parser.

I fired up codesandbox and wrote an app that when given a base64 string
would output its bytes colored based on the value of their value. I used the
HSL color space with the value of the byte controlling the hue so that nearby
valued bytes have similar colors. It is very similar to cortesi’s
binvis
but this one I could hack more easily to serve my needs.

Coloring bytes..

Coloring bytes..

Seeing the data colored helps to see the different parts of the binary format
and find patterns. For example, here we can see at the bottom a red/green
check pattern. It is a two byte pattern, and we know that at this part, we
should have text so it is a good clue that the text is encoded in UTF16.

This app was really helpful when writing the parser since I could better
understand on which part the parser was struggling.

To write the parser, I used binary-parser. Its API make it easy to make a
parser. To be able to conjugate it with the binary viewer, I monkey patched
the code to add before/after indexes to each field so that I could see each
field separately in the viewer.

Here is an extract from the parser:


const sidecarParser = new Parser()
  .endianess('big')
  .string('guid', { length:  32, stripNull:  true })
  .seek(4)
  .buffer('v1', { length:  4 })
  .buffer('v2', { length:  4 })
  .buffer('v3', { length:  4 })
  ...
  .uint16('next_id')
  .seek(4)
  .uint16('index_count')
  .seek(2)
  .uint16('bpar_ptr')
  .saveOffset('bpar_ptr_index_')

and its usage:

const rawData = new Buffer() // binary data
const data = sidecarParser.parse(data)
// > { index_count: 16, next_id: 5, bpart_ptr: 300, guid: ‘CPAR….’ }

The structure of the binary is as follows:

  • A header
  • Pointers to the annotations
  • Annotations files

When parsed, it looks to be like take care of:


Amazing endia

To format the bytes extracted from the tips piece, I feeble the Uint{8/16}Array
APIs from the browser that serves as views on a byte buffer. This worked effectively
for Uint8/16 integer buffers. The difficulty I confronted used to be with the string buffers :
the Uint{8/16}Array APIs work within the platform endianness (to utilize the native
APIs where that you would imagine). My pc used to be in little endian whereas the annotations
had been encoded in utf16 astronomical endian. This meant that if I wanted to learn a buffer
with String.fromCharCode I had both to swap the endianness of the buffer or
to utilize a DataView (Javascript API to learn / write files fair of the
endianness of the platform). I did no longer know DataView at this point so I swapped
the bytes nonetheless DataView would have confidence worked since you would prefer the endianness
when reading a mark.

Data corruption

To test the parser, I downloaded sidecars by the whispersync consumer
I was building and dumped their contents within the take a look at folder. When attempting
to parse them, the parser used to be struggling and I might perhaps seek many “EF BF BD” bytes
coloured in blue.

Corrupted bytes visible in blue

Corrupted bytes seen in blue

These bytes had been mainly direct within the piece of the binary containing
pointers to the annotations areas. This made the parser fail. After a
bit of struggling seeking to shift the tips or ignore those bytes, I searched
for “EF BD BD”.

EF BF BD is UTF-8 for FFFD, which is the the Unicode replacement character
(feeble for files corruption, when characters can now no longer be converted to a clear
code web page).
https://stackoverflow.com/questions/47484039/java-charset-decode-disaster-in-converting-string-to-hex-code>

I figured out that the quiz JS library feeble to salvage HTTP responses
tried to decode the tips as UTF-8 by default and failed (as sidecars are
transmitted as binary files over the network). Resolution used to be to position
{ encoding: null } as the quiz alternate recommendations so that it returns
a Buffer. The buffer can then be encoded in monstrous64 to print it to the console
with no garbled character. This monstrous64 output can piped into monstrous64 --decode
forward of dumping it to disk (console.logging the buffer would are trying to
decode the tips as utf-8).

HTTP server to explore hiss material

At some point soon, to be ready to explore annotations, it looked to me that the CLI
used to be a little bit restricted to explore the hiss material without trouble. I added a HTTP server
serving an index of the books and a book web page exhibiting the annotations. The
hiss material is fetched from Amazon whether it's no longer on disk, and it's far then saved in
JSON format for later score admission to.

I feeble fastify which is an alternative to particular that promises to be great
sooner. It worked effectively for my straightforward capabilities.

Listed right here are two screenshots:



Conclusion

All in all, it used to be a extremely honest ride, I learned about

  • Man-in-the-center in be aware
  • Encryption
  • Binary decoding
  • String encodings

There used to be loads of hurdles nonetheless I was basically pumped up by the truth that the
files recovered used to be famous to me : I take care of reading and am now pleased
that my highlights will are residing in a format that I will be ready to learn
in a truly prolonged time for now. I am also pleased to have confidence them in a straightforward JSON text format
that I'm able to quiz the utilization of tools I am familiar with jq, rg or grep.

Next plans:

  • affect a Comfortable connector the utilization of the whispersync-consumer to have confidence
    my annotations automatically synced to my Comfortable 🚀.

  • seek if it could in all probability perhaps affect sense to have confidence some roughly integration with Calibre

The code is accessible right here for training capabilities: whispersync-lib-code.

Sources

Loads of resources had been of ample relieve when building this challenge.

  • Lolsborn’s repos : readsync and fiona-consumer

    • Precious for quiz signing and starting up out the Javascript structure
      of the Javascript consumer (looks Scala is awfully fleet to convert
      to Javascript).
  • KSP (Kindle Server Proxy)

    • Implementation of a middleware server for Kindle, seamlessly connecting Calibre and Kindle
    • Recreates the sidecar, honorable to fancy the format of the sidecar
  • Reverse engineering a file format on WikiBooks.

Trusty favorite files on how to commence with the decoding of a binary
format.


Read More

Leave A Reply

Your email address will not be published.