Quite some time has passed since I last wrote a coherent article on this blog. Most updates since the big "Build on Stolen Data " post have been minor development updates. And so, I'm scared that it may look like the project has stalled from an external point of view. No new blog posts, no new software updates - It must mean the project's dead.
But, of course, it's not. However, I've been more hesitant to build over the last month. It's been another one of those moments where I deliberately took quite a step back to think thoroughly about what should and can be next.
Already during oceanDAO's R7 vote, I sneaked out - caught a plane to Italy, and had a week of holiday relaxing on Mediterranean beaches. That vacation was direly needed and super nice - but I'm not going to lie - the week before boarding the plane ended up being quite stressful, particularly as I had to make an extraordinary decision:
See, running a web server is no joke and not something "fun". On the contrary, it's a duty that requires a large capacity of time and mental commitment. For example, there's my personal blog that I've hosted on GitHub Pages. I honestly never worry about it as it consists purely of static pages hosted by GitHub. I don't use a custom domain, so its online status is almost fully in the hands of GitHub's engineers. However, I also don't try to sell things on there, and I'm not trying to build business relationships on it either. It's personal.
That's quite different for rugpullindex.com. It's all self-hosted and I've deliberately framed it as a highly reliable and performant service. Of course, I could choose to frame it differently. I could even choose to build it the easy way too. But I like to think that I'm gaining a competitive edge by DIY'ing my infrastructure. Additionally, I take quite some pride and gain motivation because I'm able to deliver on these personal goals.
But then what to do in case I direly need some vacation from my 1-dev startup? And indeed, that was the dilemma I found myself in before going on vacation. So the short answer to a rather long story is that I've prepared myself to go without my laptop:
I messaged some of my data providers and asked for planned breaking changes in the upcoming weeks. For months now, I've improved the crawler's reliability by building redundancies. In addition, I made sure to have some backup plans for some of the worst-case scenarios.
And what can I say: It all worked out (gladly). I ended up being in Italy for one full week on a fantastic and relaxing vacation while rugpullindex.com continued to deliver its service. So that made me happy. But then also recently, and thanks to the fact that I could step away for one week, I've also started to reflect more on what's important for a project like RPI.
Historically, and that's true for many of my prior projects, I've prioritized building product and utility before anything else. My idea had always been that a great product distributes itself through, e.g., word-of-mouth marketing. And while I don't doubt the existence of such a distribution effect - I also don't believe it to be a very effective marketing strategy for a product's audience of RPI's size.
Indeed, over months of observation and testing, I've learned that mostly the opposite is true. The attention economy is real, and this blog's reach is tiny, especially when compared to what huge quantities of media we consume elsewhere.
And that's why, after my vacation, I've prioritized "softer" goals over "hardcore" product development.
If you'd picture me a month ago, I'd probably be this guy with a wrench in his hand and a hardhat on his head. Today, I'd suggest that I'm at least wearing a tie or a fancy shirt. I'd probably look pretty weird, wearing both tie and hardhat. But whatever, both are valuable roles for sure - sometimes, one is more important than the other.
To conclude this rather complex post, I'd like to leave you with something more simplistic from yet another self-help startup book I've recently discovered. It's called "Start Small, Stay Small: A Developer's Guide to Launching a Startup" by Rob Walling. On an early business's priorities, he claims that:
"Market comes first, marketing second, aesthetic third, and functionality a distant fourth."
I finally managed to implement a minimal user system to handle more than one user for the API. I've now sent out two additional keys to community members that asked for API access. If you, too, are interested in using the API, please shot us a message on Discord or email.
A few days ago, GitHub released a new tool for developers called CoPilot. It's a software plugin for text editors as VSCode that allows developers to "Just Hit Enter" to autocomplete the code they've written so far. I recommend checking out CoPilot's website. Their showcases are pretty incredible.
In fact, they're so awe-inspiring that I had a few chats with people about it in recent days. "Man! We're the next to get replaced by AI!" and so on, where the initial reactions. I'm sympathizing with those emotions - but I was also fairly skeptical. I immediately had doubts about that bot helping me write helpful unit tests - which I find the most tedious and analytically-laborious task in writing software. I don't think it'll understand different testing contexts.
But some people on the Internet managed to get ahold of this program quickly and started exploring its outputs. Suddenly it became conceivable that the "public" data GitHub had used to train CoPilot may include restrictively licensed code like those residing under ,e.g., the GPL.
Someone pointed out on Twitter:
github copilot has, by their own admission, been trained on mountains of gpl code, so i'm unclear on how it's not a form of laundering open source code into commercial works. the handwave of "it usually doesn't reproduce exact chunks" is not very satisfying pic.twitter.com/IzqtK2kGGo— eevee (@eevee) June 30, 2021
Shortly after, another person followed up by having CoPilot generate the infamous fast square root algorithm from Quake III (GPL licensed).
I don't want to say anything but that's not the right license Mr Copilot. pic.twitter.com/hs8JRVQ7xJ— Armin Ronacher (@mitsuhiko) July 2, 2021
Indeed, the later tweet author even tricked the auto-completion algorithm into suggesting a license for the code as well.
Now, I think it shouldn't come as too big of a surprise that the inputs of a machine learning algorithm can reappear as fragments in output later. I guess that for ML-generated images, our brains are just not good enough to compute matches. Hence, we credit most contemporary algorithms as "pretty original". In contrast, it seems relatively easy for code-copy-cats to be debunked. Particularily, when this one suspicious-looking random-ass hex value "0x5f3759df" is auto-suggested.
For me, what I found much more scandalous is, however, this other dimension where it appears we've caught Microsoft stealing from all GitHub users. Could it be that it simply had dumped all sorts of restrictively licensed open source code (including MINE!) into their model? Could it be true that they're now profiting by stealing from my work?
That'd be outrageous! And so, as I was informing myself some more, this subtle anger - triggered by my sense of justice feeling hurt - started growing.
It only got worse when I read more Hacker News comments. Specifically, one which claimed that many image recognition algorithms were likely just trained with copyrighted image data too. I mean, I don't even know what to say about that.
Indeed, now when I'm writing these lines - it's so absurd - I feel like doubting myself. Is it really that bad? But see, for me, it didn't even occur until now that training a model on copyright-protected images is an option. My most prolonged understanding of how these models were built was: (1) Launch a successful social network. (2) Setup shitty T&Cs. (3) Harvest data from users legally (but maybe immorally).
However, to now learn that a giant like Microsoft dares to just release a product built on looting is genuinely a shock. Particularly when considering that I could be part of the wronged ones. Or; that I could be part of the damaging party by having accidentially used such an illicit algorithm .
Right now, it feels absurd. I'm classifiying the usage of copyrighted content in machine learning as deeply immoral and illicit. Involuntarily, an image of a blood diamond comes to mind.
I think it should be illegal and pursued to distribute such a piece of code.
Edit on July 8, 2021:
oh my gods. they literally have no shame about this.— ✨ Nora Tindall 🪐 (@NoraDotCodes) July 7, 2021
GitHub Support just straight up confirmed in an email that yes, they used all public GitHub code, for Codex/Copilot regardless of license. pic.twitter.com/pFTqbvnTEK
I only just entered the blockchain world professionally having watched closely from the sidelines for a number of years.
I've worked in corporate Advertising and Tech consultancies over the last 10 years and tried my hand in a number of different startups.
I recently joined the Ocean DAO and met Tim. I could see that what he was doing would be successful if Ocean is successful and while the index was technically sound, I knew I could contribute to its development by helping to round out Tim's strong technical skillset by paying special attention to user's needs.
Firstly, I wanted to identify the opportunity available to the index and answer the following questions;
Knowing this gives us a much clearer idea of what we're building and why we're building it. Opinions are great, but it's better to have actual feedback from real users. That way you know you're building something for real people and not just building the thing you want to build (something I've been guilty of doing in the past!).
Some were simple suggestions like 'I think you should change the name' while others were more illuminating and got to the core value proposition from the user's perspective like, "I don't currently use the site because I'm not looking to invest in data tokens."
After adding all 77 one liners to a spreadsheet I went through and tagged an associated theme to each piece of feedback (e.g. Risk, Education, Marketing etc).
I made a note of these finding and plan to feed them into marketing initiatives and prioritisation decisions in the future.
After digging deeper into the findings it became clear that an imediate opportunity exists to help users make better $OCEAN staking decisions.
I personally wouldn't have seen this as being RPI's primary opportunity to begin with but approaching users with an open mind and enabling insights to emerge from speaking with them is a great way to unearth hidden jems.
The secondary objective was to help users make better data token investment decisions which was more in line with my expectations.
After breaking the raw user feedback data down further I identified 9 key action items.
I then added these to the table below and prioritised them using the RICE prioritisation framework.
This helped us identify the relative priority of the various tasks at hand and plan accordingly.
The 9 items have been broken down further into a Now, Next, Later table enabling us to be clear on the areas of focus.
It's unlikely we will follow this table completely as things will inevitably change over time. However, this does give us a great foundation to build upon and start delivering value to users soon.
Tim was able to publish an article explaining how the ranking algortihm works already. You may need a PhD in Mathematics to understand it but that's besides the point.
We've got more work to do in order to break down these one liners into actionable user stories. For now though, we have a new and refined focus - helping users make better $OCEAN staking and datatoken investment decisions.
...watch this space.
Written by: Scott Milat
StateTreelibrary for handling state transitions using a Merkle tree.
In several previous updates, I talked about how difficult it is for RPI to monetize considering the uncertainty surrounding Ethereum gas prices. A few weeks ago, I said that deciding on a scaling solution is difficult. I mentioned that I'm doing my own research too.
Today, an important moment in this process has come. I've been able to put a tiny amount of my idea into code. Introducing honeybatcher, a pessimistic (or as I'd prefer: "strongly-consistent") rollup.
Now what's the idea here? The idea is that both zero knowledge proofs as well as building optimistic consensus mechanisms is incredibly difficult. Hence, instead of using cryptographic proofs or building my own distributed system, I thought about what I could leverage today to minimize gas costs and complexity.
First of: The EVM - it's awesome. It allows to run completely deterministic code. That's useful, as the complexity of our system is reduced immensely if we just assume a smart contract to be the permantent leader within our distributed L2 network. Not only does it make writing sequencer code easier, as just a single contract has to be monitored, it also allows a user to withdraw their funds at any time. If the an Ethereum smart contract is the leader of our distributed L2 system, any possible write or read can be done at any time without losing a consistent view over the system's state. This is different from optimistic rollups where consistency emerges from delayed economic finality mechanism that is incentivized for using fraud proofs.
Now with regard to minimizing gas cost, I found that a single Ethereum
transaction costs at least 21k gas. More is required for special EVM
executions. If we observe transactions on Ethereum, however, we'll see that
many addresses do the same thing: they either swap or transfer their tokens.
Hence, if we'd be able to batch-execute multiple transactions in one, we could
safe a second user up to 10.5k gas and a third user 14k, and so on.
Furthermore, by building the rollup in a usecase-specific manner, we could safe
further gas (e.g. by tracking balances in Merkle trees and by authorizing
ERC20 transfers using
ecrecover (only 3k gas)).
Now, for the code I've published today - it's not a big deal. As you may be able to infer from the above-linked repo, the contract merely a single function. There's not even a repository for a sequencer node yet.
Still, I created a new entry in this blog to celebrate my achievement. It's with its existence and the honeybatcher GitHub repository, that I'm finally able to share an idea that so far has been purely stored in my head. Today, I'm happy because I managed to express a complex thought in words and code. I'm happy as judging from my experience, it means that the amount of complex thought I'll have to go through from now on, will be less than today. It'll be downhill from here.
So what's next? Next is that I'll continue contributing code to the honeybatcher repository. I'll keep you posted on my progress here.
PS: This blog post was written in Berlin's pitiless June heat. I'm, hence, hoping that what I wrote can be considered a coherent thought and not some brain melted icecream lol 🍨
I mentioned in my last entry that the gini coefficient seemed too positive for some data sets. In particular I talked about VORSTA-2 and how was able to enter in rank #3 with only 3 token holders, the richest one owning 98% of its pool. Below is a picture of VORSTA-2's share distribution. I don't think there's a lot of math needed to understand that this pool has no equal distribution of liquidity.
When I implemented the gini coefficient score at the end of last year, I defaulted to a formula from Wikipedia that uses the income distribution of a population and a relative mean for the incomes' absolute differences (see below).
Essentially, it compares each LP's stake in the pool and then divides (roughly) by the mean stake. For VORSTA-2 (and similar ones), the formula allowed it to rank highly on rugpullindex.com as any small share in the pool counted. For VORSTA-2, 0xda2d9's stake (roughly $35) contributed as much to the calculation as the stake of 0x89717 ($12k).
Using pen and paper, I sat down to think about a solution. I think I found a simple one that I'd like to share now:
The new ranking went live before this blog post. You can check it out on the front page. In my opinion, the solution improves the ranking's results. Small sets with an apparently positive gini coefficient are filtered, while we're still giving data sets with a bigger community a fair chance.
Looking at the results, I found it curious how many data sets had "sneaked up" in the ranking over time. Obviously, I can't say anything about any publisher's intentions. It could have been that some moved some liquidity to improve their scores. It could have also just been regular liquditiy providing. It could have been chance.
In any case, however, I'm now motivated to further improve the gini coefficient's safety by tracking data that's capital-inefficient to manipulate. I already have some ideas. Until then, feel free to continue playing mouse if you dare :)
PS: Thanks so much for voting in OceanDAO's Round 6!
The crawler failed tonight due to a potentially invalid assumption of mine. Both the BDP marketplace as well as OCEAN's instance of Acquarius returned data for "VORSTA-2", when in my view only Big Data Protocol's instance should have returned anything as the data set was launched there.
I addressed the problem in a workaround and filled an issue over on the Ocean Protocol GitHub.
Finally, regarding the calculation of VORSTA-2's gini index, it's easy to see that its composition of liquidity providers shouldn't yield such a positive gini index of < 0.7 as it does today. If we look at its balances, they're roughly equivalent to:
I haven't had time to look into the math of why its such a positive gini index, but I'm guessing that it has to do with the the small amount of LPs. In the upcoming days, it's likely that I'll, hence, increase the minimum amount of LPs a data set pool needs to have (e.g. 5). I'll also look into how I can make the gini index calculation favor popular pools over less populated ones.
max-width: 70remwhile other pages have
<footer>in all pages.
/aboutpage and introduce
From rugpullindex's first days, I made its website analytics public by using plausible.io. I'm a huge fan and paying customer of their service. I like the idea of making my website's stats public. And, I'm a fan of using a solution that's privacy-friendly. Actually, I liked plausible's product so much that I've presented this website's stats as a voting criteria in all OceanDAO rounds.
Recently, however, I became aware of a problem with plausible's web analytics, that I believe is important to share here. It has to do with the fact that many ad blockers are now blocking plausible's tracking script.
If we take a look to RPI's six month unique visitors, we just see that its been fairly stable. Between February and May, there's been roughly the same amount of unique users per month (around 500).
We can also look at pausible's 30 day view to get a feeling of how many unique users visited in the last 30 days:
Indeed, the stats look plausible. Since I started to deliver my website through Cloudflare on April 21, 2021, however, I noticed a large deviation on their website analytics frontend for the same timeframe.
While plausible logs 596 unique visitors, Cloudflare displays 2,81k! I found this difference quite interesting, given that both services use a completely different technique of tracking unique website visitors.
And though plausible is a publicly-committed company towards privacy-friendliness, they've publicly stated that their script has started to appear on blocklists. This means, that for any ad blocking browser, their users might now show up as unique visitor in our plausible.io stats.
Seeing this, I thought it may be interesting to compare both sets of data side-by-side and so I did using a Google Spreadsheet:
And, indeed, the difference is huge. But it makes sense. According to some napkin math, Cloudflare has counted 87% more unique daily users on average than plausible.io.
Plausible's co-founder Marco Saric came to a similar conclusion when a year ago, he blogged about an observation that 26% of a tech-savy audience blocks Google Analytics over plausible.io's tracking script.
Seeing these facts, I found it reasonable to assume that a similar effect may happen to rugpullindex. And that's what I was able to confirm with my napkin math.
Given that users can't directly influence Cloudflare's unique visitors metric by e.g. installing an ad blocker, we can assume that on average, the daily count of unique visitors from plausible.io deviated by a wopping -87%. Instead of plausible's 590 unique visitors since April 26, 2021, it's more likely that rugpullindex had a total of 3767 unique visitors.
Plausible has reacted to the problem by allowing users to submit stats through a proxy. As rugpullindex's stats are vital for its community to determine its future funding and value, it has hence become my recent priority to address the problem. Step one was writing this post to inform everyone.
Finally, I'd like to say that I'm a bit sad about this situation. I personally care about my privacy online and I hate that tracking has become a sound business model for many tech companies. It's the reason I've always been excited about data sovereignty, OP and rugpullindex. I find it stupid that my project's stats are distorted because others are extracting millions through non-consensual tracking. Is this why we can't have nice things?
Additionally, I believe that merely tracking user flow isn't falling into the category of "privacy-invading" technology and I'd therefore like to say that blocklist maintainers should start offering more opinionated views (e.g. support plausible.io but not GA).
Finally, I'd like to say that I don't want to blame anyone directly for the situation. Rather, I believe that it's a systemic issue may be addressed by finding better business models for cyberspace that don't rely on ads.
I'm hoping to correct the stats tracking soon. And, that's all for today.
/api/v1/indices/OP-COMPOSITE-V1/ranks/was failing the daily integration tests. It's fixed. Starting to improve API tests structurally now.
Here's a quick update on the things that I've been up to in the last few days:
From the outside, these projects might look arbitrary. To me, however, they're helpful to gather more information about the state of scalability. I want to find out: Can we truly build scalable dapps today? Dapps that don't have insane gas costs. And if so, how?
The reason for taking a closer look is to have clarity that building a scalable dapp is possible today. You see, I'm sceptical as I've been burned with scaling promises both in the "permissioned blockchain" era as well as when "Ethereum Plasma" was a thing. Now, rollups seem to make a lot of sense from a scalability perspective. But let's not forget that previous scaling concepts made sense too. In the end, what matters is the rollups going into production and how much throughput they'll truely able to have.
I'm looking into rollups, as for me, there's no point in having my users trade assets for $100 transaction fee. But not only that. I think that it's starting to become an obligation to stand up as blockchain developer and take responsibility for the terrible pollutive nature of PoW chains. Sure, mining could be done with clean energy. But is it?
To have a positive impact, I think it's important optimizing for efficiency. Any tx that can be shaved off should be. And, hence, just building yet another simple Dapp simply doesn't cut it for me this time. It'll have to scale.
And so yeah, hence the research into rollups. It's been fun and I'm excited to continue improving the project.
GET /indices/OP-COMPOSITE-V1/ranks/:didfailed integration tests and is working again.
This is a backend change that you won't notice on the website currently.
I feel like its been a long time I made time to write something in this journal. That's because, as I may have mentioned, I favored building over communicating in April. I saw a strong need to do so and now that I've been able to speed up the page, I'm enjoying the fruits of my work.
However, there's a certain thing that's been on my mind a lot recently. As you may have seen from the front page, OceanDAO's Round 5 is in progress. And with that, there's another opportunity for rug pull index to receive income so that I can pay rent.
In the beginning, I shrugged of the vote easily as the project was young and since I wasn't convinced in its importance. In the recent weeks, I've realized that it, too, became an important part of my online expression and a regular part of my life. I love doing things my way. I feel empowered. Working for RPI feels great.
But with that additionally gained freedom, as usually in life, there also comes a fear of losing it. I learned that when applying for OceanDAO R5 the last week. Simply put, I've been surprised to see that through its competitive nature, most proposals have risen in quality. The fund raising, too, has become more complex and time-consuming. There's now more capital voting than before.
And while I have no doubt that this capital can also work in RPI's favor, it makes me anxious to see that its existence is directly coupled to vote's outcome. That makes me anxious as, if I ran out of money, justification for spending time here would become difficult as I'd earn more from serving freelance clients.
This leads me to the point of this article, which is monetization. Or shall I say diversification and the spreading of risks? In Peter Thiel's "Zero to One" there's a section where he goes on to say that building a business is mainly about surviving throught progress and time. Along with that he states that PayPal has made more money in its recent last year than all other years of its existence combined. For him, it's all about continuity. I feel the same about starting a project.
Service and its quality is 99% about showing up and about consistency. Its about continuing to build trust. I know this as I've been part of BigchainDB GmbH and ascribe in their first years. I've witnessed the doubts from potential customers first hand. Sure, we faced problems building the product. But convincing customers to commit to our product was more than just writing code. It was about the team's quality, committment and the perceived chance of the company's survival. A customer trusting in us was a customer buying from us.
I've made that experience in 2015 - 2017. I realize it now and I want to act on it by making time and external progress the friends of rug pull index. It shall have limited downside and unlimited upside. Practically speaking, I can execute on that by removing weak parts and by improving the good parts. A weak part I've discovered is its dependence income through the OceanDAO.
Hence, for the future, I want to commit myself not only towards building and communicating but also towards to the goal of increasing and diversifying the project's income and funding.
A preliminary idea: Sell ads. Because why not? Building the ad spots took me 5 minutes. I've added outbound click tracking too and the results are starting to come in. Last week, there was a 32% chance that a user clicked a link on rugpullindex.com. At least for last week, it confirms that this newly created estate holds value. I'm open to discuss rates. Contact me.
Es bleibt also spannend!
To conclude: yes, you've heard it right I need the cash (to) flow! I need it now. If you have ideas too, please get in touch!
over the last month or so, the site had started to slow down significantly for a few reasons:
That last one ended up taking 4-6 seconds uncached now. You may never notice it
as I'm using
Cache-Control, NGINX cache, and now also Cloudflare's CDN for
caching the page all day.
However, I noticed the annoyingly slow page loads, so I decided to do something about them. Not only were they a problem for general usability, but they also stood in the way of adding other features, e.g.:
Hence, on Monday, I decided that computing scores on the fly have to stop! And since then, I've been heads down adjusting the database such that we get sub-second latencies for loading / on an uncached browser window.
Last week, I already took care of reducing the number of extra round trips by also finalizing a specially made library for drawing SVG line charts. svg-line-chart is integrated server-side and hence adds no additional overhead on delivery.
We end up with the following: The delivered data when accessing rugpullindex.com has a total footprint of 50KB, making it fastly available from almost any network!
Speed is a killer feature. I truly believe in that, and I hope you now come to rug pull index happily, knowing that the browser will load the page in no time.
Cache-Controlheaders for all static assets
I felt stressed before easter. Through COVID lockdowns and all, I ended up mostly working anyways. I mean, there isn't much more to do really. But then for easter, I decided that I don't want to spend time on the computer. I felt like wanting to do a small digital detox when visiting my parents in the south.
So instead of my computer, I ended up packing lots of books and even some utensils for drawing. Now that easter's done, I'm happy I've tried!
The main reason, I wanted to bring the computer was indeed rugpullindex. What was I supposed to do when the crawler went down or when other problems arose?
"For one, the crawler works fairly stable now," I thought. But another reason I felt comfortable leaving the Macbook Pro at home was that dev tools on Android are starting to become interesting.
To be able to maintain the rugpullindex server, I ended up generating a ed25591 SSH key on Termux and uploaded it. I tried SSH'ing into it and it seemed to work perfectly.
I also played around with Termux and related apps in the F-Droid store. They start to look quite promising. It's like having a full-featured shell on your phone. Some people even use it with a Bluetooth keyboard.
It sounds like such a cool idea for a product. Make phones more like computers. SSH over mobile internet. Anywhere!
Anyways, apart from some minor edits to the page, I didn't have much to maintain, fortunately. But using vim and SSH, I managed to write this little post, reflecting on this experience.
Happy Holidays everybody!
In the last OceanDAO town hall, I talked about rug pull index. Here is the recording:
Last week, Kevin from datapeek.org asked me to do an interview with him. As I found the idea fun, I said yes and we had an email-based chat. The interview ended up being mostly about rug pull index and how I ended up working on it. It was my first time ever being interviewed. And what can I say; I enjoyed being in the limelight for once!
You can read it here.
That's all for this week. I'm wishing you a nice weekend. And hoping for myself that the crawlers stay online this time around.
When you garden plants, sometimes just a little trimming of one or two leaves or branches is required to allow the plant to grow further. Today, after having a call with one of my users, I felt the need for trimming.
I updated to the latest version of classless.de, my CSS framework, and I rearranged the front page to show the information more quickly. Just a while back I read an essay called "Speed is the Killer feature". Today, I feel like it reflects my principles for building web apps well.
I hope you like the updated front page.
oh how I wish to have a solution to the crawler problem that rugpullindex.com is currently experiencing! As I said, I've switched from Ethplorer to Covalent recently, as I had experienced a bug with Ethplorer. Well, now it turns out that Covalent is less reliable than Ethplorer. In fact, Ethplorer came back with a bug fix recently.
So since it seems that I shouldn't rely on neither of them 100%, I'm now changing the code to use them both. Distributing my risks. Dogfooding my own mantras. If one fails, I'll just use the other. Hopefully that'll solve the problem for good.
That's all for now. Planning to do some further updates this week.
Tonight, the crawler broke when our service provider Covalent returned a non-JSON response. I fixed it by now catching that error and by re-starting the crawl.
In OceanDAO Round 3, I announced building towards a graph that shows the index's historical performance. But for the last month or so, most of the changelog I published on the blog was about improving database queries. Indeed, I had to adjust all of those to show the following graph today:
Here, you can see a first try at displaying the index's performance starting on February 15, 2021, to March 15, 2021. X-axis showing the dates of measurement, y-axis the index's price in EUR. Please note that I haven't had time to confirm its correctness yet. However, it looks roughly correct. So how did I create this graph?
target = 100(in EUR)
relative_score = score / SUM(scores)
share = relative_score * target / price
sharerepresents the number of tokens the index holds for a given data set. After buying a target of 100€ in tokens, all I do is note down daily all token prices. By summing all token prices per day, I get the index value per day. And those values, I plotted in this graph.
I know, for now, that might sound not very easy. But I've plans to improve this communication. Anyways, that's today's update.
PS: Another change I made today:
this whole deal with the service provider is turning into a bit of a disaster. Since tonight, it's returning a 400 error for even more assets. I've received a response to my email that I sent to support. "We will investigate the issue and fix it in the case of a bug.", they told me on Thursday.
For me, the whole thing is starting to frustrate me. I knew the risk of being dependent on a third-party service provider. And, I already had plans for my own crawler in place. I feel I'm quite unlucky that this is happening now. But it's not in my control and so I'm currently trying to fix the problem in another way.
I've thought about building a crawler myself now. But I don't think I'll be quick enough. Maybe, there's other providers with similar functionality that I could use.
I'll keep you updated.
I researched online and found a similar provider. I can't speak to its reliability either, but using it addresses the problem for now. I've deleted the crawl of tonight and re-crawled. The website now displays the correct ranking again. The issue is resolved.
Tonight the crawler threw an error when retrieving the top holders of
0x5e9939f6D959ffE9B328243DfaDBEED9C46ac197 (token: EXCANE-93). Below is an
image of the API service's logs.
You can see that the request stopped working tonight and instead threw a 400 error. For now, I've added an exception route that allows the crawler to continue when receiving such an error.
I've reached out to the service's support too. For the time being though it's likely that EXCANE-93's information is displayed incorrectly. I'll keep you updated.
On Friday, March 5, 2021, OceanDAO's round 3 vote finished. If you browsed the website last week, you might have noticed the yellow call-for-votes I had put on the front page. I can't prove that it had much of an effect on the voting outcome, but I can say that there's an essential difference between the votes of rounds two and three. Let's look at the data.
In round 2, 63.08k OCEAN tokens or three addresses voted for rugpullindex. That's 5.28% of all votes. This time around, 233.54k OCEAN tokens or two addresses voted, making it 7.73% (+2,48% or an increase of 47%, relative to round 2) of all tickets that voted.
I doubt that we can draw many insights from the above-presented data. Still, in the spirit of gathering data for self-improvement and record some insights for posterity, let's look at what deliberate optimizations I made between rounds 2 and 3:
Lastly, I'd like to thank all of my voters! Your vote allows me to spend quality time on issues that I think are worth exploring, thinking about, and fixing. Thank you! That's all for today.
With regards to building rugpullindex.com, there are currently two problems bugging me. One is that gas prices on the main net are insanely high right now. And two is that the Ethereum front end space has become even more hostile than what I was used to before.
In a recent post over on my blog, I've made the argument that "Ethereum isn't fun anymore" and that "web3 is a stupid idea". Though I've earned some criticism for these posts, I'd now like to double down. I have an alternative vision for web3. Purely from a pragmatic, architectural point of view.
I've written it in long-form over in the Ethereum/EIPs issue section already. We need to start thinking practically about light clients now. Ironically, full nodes are costing developers real dollars today. And building truly decentralized applications is hardly possible anymore without a credit card—the irony.
I guess nobody designed the Ethereum protocol with light clients in mind. Still, I think there are small fixes, applied here and there, that could help dramatically improving user experiences in web3's front ends. So what's the plan?
Just recently, WebRTC was made a W3C and IETF standard. WebRTC (or Web Real-Time Communication) is a concept for sharing data directly between users' web browsers without going through middlemen like servers. "Over the past year, WebRTC has seen a 100X increase of usage in Chrome due to increased video calling from within the browser.", the article states. But WebRTC cannot only be used for distributing video. Reasonably, we can use it to spread any data. And one data that I've ranted about not being distributed well enough is that of the Ethereum blockchain.
WebTorrents allow us to download torrent files directly from the web. instant.io, for example, enables a user to paste in a magnet link to download it within the browser instantly. A client could now easily send a magnet link to start syndicating files.
In general, torrents have a rather bad reputation, mainly as they've been a driver of piracy in the past. However, speaking of their technical properties, torrents are like one of the coolest technologies around.
So how does WebRTC, WebTorrents and Web3 fit together?
WebTorrent utilizes WebRTC in browser environments. It can fall back into a webtorrent-hybrid for server-side usage. What's fantastic is that WebTorrent has a distributed hash table built-in. It even allows specifying a custom hash function. So what's the plan?
For now, the plan is to democratize the access of blockchain data for regular web3 apps again. The first step towards this will be creating a lean component that we can use with web3.js. Its goal is to cache and store all requests from web3.js that have to do with a full transaction or an entire block. We will await the response, cached for these requests, and offer it for download on WebTorrent via a custom DHT.
If a second client comes along, for each request they make towards the full node's RPC endpoint, it will be interrupted, and instead, will consult the WebTorrent's DHT first. In case the retrieval of a transaction is possible via torrents, it will make no RPC endpoint call. That is good for a few reasons:
I'm not sure if I'll handle this project as part of rugpullindex.com. However, only through it, I had the idea for it. In any case, I think building the project shouldn't be too much of a hassle as WebTorrent comes with batteries included. As a start, I'll attempt to create a library that can bootstrap the Ethereum WebTorrent network for sharing transactions and blocks. Then, I'll build a simple bootstrapping node capable of talking back to an archive node for eventually missing transactions or blocks in the DHT.
Then, I think it's a question of whether the idea is accepted and used by the Ethereum community. However, a web3 provider could significantly reduce the number of requests a dapp does daily; I could imagine there be a will to give it a try.
And that's how I want to contribute to scaling Ethereum for now. I hope you enjoyed reading. Feel free to let me know your thoughts by reaching out to me. My email is on my blog.
That's all for today.
On Feb 18, 2021, the maintainer of the "Oceancap - Datapool Evaluation and Charting" (ADASTA-60)" data set tweeted:
1.) We decided to close our Oceancap pool on 21/02 due to the market situation. We are pretty sure that @oceanprotocol is working hard on preparing an updated Marketplace in the near future. We are waiting on the sidelines and take a break for now.— Oceancap - Datapool Evaluation and Charting (@OCharting) February 18, 2021
Since rugpullindex.com listed ADASTA-60 in its TOP 25 index for a while, I was curious how the ranking algorithm would react to the announcement. Remember, the algorithm ranks a data set based on its market's performance. It works "autonomously" and isn't capable of comprehending the statement—instead, it's rating each data set by its market's performance. Our thesis is that if a data set's market is strong, its value is high too and vice versa.
Here is ADASTA-60's performance within the context of the announcement:
|Date||Score||Gini||Liquidity (OCEAN)||Price (OCEAN)|
Looking at the market's data, we can see the following:
rugpullindex.com's initial thesis that markets are a proxy for data sets has found some evidence in this particular case. rugpullindex.com successfully forecasted an investment risk (Gini-Index close to 1) before it manifested itself. Its algorithm is now automatically decreasing ADASTA-60's stake as the market reacts to the announcement.
I find this result excited as it's the first time we can see the collected data and my work in action. 🥳
In the future, I want users to gain the same insights I was able to acquire today. I'm excited to continue working on that.
As announced on Feb 12, 2021, liquidity and price are now displayed in EUR. However, EUR values are not yet used within the ranking algorithm.
Midnight: After months, I made some changes to the crawler again which lead the page to be down the last two nights. The reason was a bug in the price crawler.
I was trying to get OCEAN's current EUR price and I was using Coingecko's historical API, that didn't send back any results (because it's "historical" and not "present" time). The crawler is now using Coingecko's simple API to get the price.
A few reflections on what I learned by having to open my laptop before breakfast and before going to bed on a Saturday:
Working on a website that always displays new information is fun. I check rugpullindex.com myself daily. I like the feeling of gardening the website. But soon I want to find ways to improve upon the above mentioned issues. It may just be a matter of improving the crawler's tests.
Today marks an important day in the life of rugpullindex.com and OCEAN. When I was trying to compartmentalize the crawler's myriad subqueries, I noticed that, as intended, all data sets are normalized based on the all-time highest liquidity a data set pool reached.
What I had neglected was that I used OCEAN as the unit of liquidity. It makes no sense, though, as the goal is to compare any data set relative to the all-time best performing data set. With a fluctuating token, however, this may not work well.
Consider the data set QUICRA-0 that had 499,296 OCEAN in its pool yesterday—assuming that OCEAN/EUR traded at 0.5 EUR yesterday, QUICRA-0 had roughly 250,000 EUR liquidity in its pool. Now, consider that today the price of OCEAN increased by another 0.5 EUR to 1 EUR. But no change has occurred in QUICRA-0's liquidity pool. It means that while the number of OCEANs backing QUICRA-0 didn't change, its performance increased as the price of OCEAN doubled. Compared to yesterday QUICRA-0 is doing 2x as good!
Hence, I plan to measure a data set's liquidity now in fiat or specifically EUR. I've already finished the adjustment of the crawler. I wasn't able to finish integrating the change into the UI. But once the update is live, I'll inform you about it in detail.
👋 Today marks the first day that I'm "getting paid" for working on rugpullindex.com. It's because I came in seventh place in OceanDAO's round 2 of grant proposals and was rewarded 10k OCEAN. My original plan was to use the DAO's grant as a freelance budget to work on rugpullindex.com properly. Hence, I swapped them to USDC.
Having a stable supply of digital currency now means I can "invoice" rugpullindex for the work I'm doing. It's really just a fancy way of doing accounting. There's no official company or anything. Still, it's a big step as it means that I'm now able to justify spending time on the project during "my working hours."
And it shows because I've been already working on it for a day. I've expanded the navigation and slimmed down the landing page. I've done it to get better results on PageSpeed Insights and make rugpullindex.com perform better in search engine results. As a result, there's now an about page and this blog. I'm planning to deprecate the old /changelog.txt.
Another SEO-thing I've done is that I've added a /sitemap.xml for crawlers. I'm tracking the website's performance on Google's Search Console now too. My plan is to make the website more informative over time.
And that's all I've to say for today. I hope you like the changes. And I also wanted to thank everyone that voted for me in the OceanDAO too. Thanks!
Hoping to see you around here soon again.
Wow, it's been a while since I wrote something here. Still, I was busy thinking about next steps for rugpullindex.com. Mainly, about receiving funding to being able to continue the project.
And, indeed, I'm recognizing a promising opportunity ahead with Ocean Protocol's "OceanDAO"  having its second grants funding round on Feb 1, 2021. On Monday, it lead me to write a first draft for a grants proposal . OceanDAO recommends submitting an "Expected ROI calculation" in the grants proposal to make voters understand the potential and future returns of the project. However, it turned out, that DeFi Pulse Index isn't able to capture a significant market share within the DeFi ecosystem (0.03% or $55M). When applying the percentage to rugpullindex, the prospect became even bleaker as 0.03% of $600k would only amount to $183 of market capture for rugpullindex.com
Even though, it did disappoint me that the math wasn't working it, I'm still bullish as ever towards the project. Especially, as I recently read in one of Matt Lavine's "Money Stuff" newsletter posts, that tradiitonal index funds can become huge anti trust problems as soon as they start to hold majority shares in certain market segments . When, for example, the S&P500 is suddenly capable of voting on board decisions of FAANG (Facebook, Amazon, Apple, Netflix, Google), I think it's no surprise that they wouldn't incite any of those companies against each other. After all, that could lead to a decrease in the index's value.
To me, that truly sounds like an antiquated problem. Technology allows the sensing of a crowds opinion already. Within blockchain, such governance scenarios have long been a topic of discussion. Actually, they work today . And that's why I think that building indexes on blockchains is a cool problem that can address real-life problems.
In conclusion, I would like to say that I'm still eager to continue development here. I hope to receive a grant. So if you're reading this, make sure to vote!
That's all. Have a nice day.
To increase virality of the service, I've decided that I want to have some type of badge for a data set provider. I ended up using shields.io. By visiting the FAQ, you can now add a badge for your own data set. It's a beta features that I haven't testet too much. So I'm curious on how it goes.
Released the rugpullindex.com launch blog post on my personal website: https://timdaub.github.io/2020/12/11/rugpullindex/
It got lots of attention which made me happy. Lots of people have reached out since then.
This morning, when I had my coffee in the park, I thought again about what I wrote last week regarding the inclusion of liquidity into my risk model. I'm specifically referring to the changelog.txt entry on the 30/11/2020, where I proposed to use the absolute currency value of liquidity within a pool to multiply it with the Gini score.
Thinking about it again, I realized that I don't like the approach I proposed then anymore. The reason being, that by using e.g. the EURO value of a pool's liquidity in a multiplication seems fairly arbitrary. Why e.g.
After all, the Gini score and each market's liqudity are independently-provisioned quality measurements. Hence, this morning, I started thinking about how to improve what I proposed last week.
I believe that a relative quality measure that is a combination of liquidity and equality distribution is still useful for investors. I think it should not be denoted in a commonly known unit, unless is makes a specific quality statement about it.
For example, in the future, I could imagine a quality measure called "Safe liqudity" that is denoted in OCEAN, EUR or USD and that gives information about the absolute amount of liqudity that is safely distributed within a pool.
However, for now I'm not interested in that measure. Instead, I'd like to use a comprehensive and relative measure of liqudity over all markets as a measure of an indivdual pool's liquidity. Actually, my friend Jost Arndt proposed a simple algorithm to find a relative measure for all pools' liquidity:
His argument was that now, since all pools' liquidity is within the bountries of , this measure could be used to find an overall score s to rank all data sets:
The properties of this model are great because:
However, I'm not only a fan of the algorithms properties. From the get-go of this project, I've been convinced that a simple measure is key for the meaningfulness and utility of the index. I believe that the above formula passes those criteria. Hence, for the upcoming weeks, I'm planning to integrate it into the website.
And that's all for today's thoughts on rugpullindex.com. If you've found this entry useful or have feedback, feel free to reach out via [email protected]
The root endpoint
/ now includes a "Cache-Control" header with a maxAge
around the time of rugpullindex.com's daily crawl. This means that a user's
browser is now caching the site. But additionally this allows a CDN or reverse
proxy to cache the site too. For now, I've configured my reverse proxy to
cache according to "Cache-Control" headers which speed up page loads
significantly. Since for most of the day, statically-cached content is served
up now, this should allow handling lots of traffic too.
Currently, I'm still thinking a lot about rugpullindex.com and how to grow its audience. I believe that in the future, it will be really important to be able to automatically filter and sort blockchain-based markets on some sort of metric, similar to how the Web is sorted by algorithms today (social media algorithms, Google's page-rank, etc.).
In terms of improving the site in the short term, I'm hence driven to do two things in particular:
Regarding (1), improving the scoring method, I already had a particular idea that I'd like to motivate briefly.
Most decentralized exchanges using automated market makers currently use liquidity to measure a pool's overall performance. However, as we've discussed already, this ignores the fact that distinct liquidity can have distinct quality. As we've assumed from the beginning, the distribution of liqudity shares within the pool can be used as a qualitative metric. Some examples:
Hence, instead of sorting the index only by a pool's liquidity distribution, I'm now thinking of using the score as a weight on the pool's liquidity:
For a pool like TREPEL-36, this would mean the following (values from today): At a score of 0.69 and a total liquidity of 40900.54€, its new score is:
whereas for TASLOB-45, having a score of 0.88 with a total liquidity of 224665.20€, it meant:
This change, as can be seen above, would then favor large pools over small ones, while still being significantly biased towards an equal distribution of shares.
If you've made it so far: Thanks for reading! And if you have feedback on this idea, feel free to contact me! My email is [email protected]
That's all for today.